我们提出了一种用于计算自动语音识别(ASR)中错误率的新方法。这个新的指标是针对包含半字符的语言,可以以不同形式编写相同的字符。我们在印地语中实施了我们的方法论,这是指示上下文中的主要语言之一,我们认为这种方法可扩展到包含大型字符集的其他类似语言。我们称我们的指标替代单词错误率(AWER)和替代字符错误率(ACER)。我们使用wav2Vec 2.0 \ cite {baevski2020wav2vec}训练我们的ASR模型。此外,我们使用语言模型来改善我们的模型性能。我们的结果表明,在分析单词和角色级别的错误率方面有了显着提高,ASR系统的可解释性提高了高达$ 3 $ \%的AWER,印地语的ACER $ 7 $ \%。我们的实验表明,在具有复杂发音的语言中,有多种写单词而不改变其含义的方式。在这种情况下,Awer和Acer将更有用,而不是将其作为指标。此外,我们通过新的公制脚本为印地语开了一个21小时的新基准测试数据集。
translated by 谷歌翻译
我们研究应用语言模型(LM)对指示语言自动语音识别(ASR)系统输出的影响。我们微调WAV2VEC $ 2.0 $型号的$ 18 $指示性语言,并通过根据各种来源派生的文本训练的语言模型调整结果。我们的发现表明,平均字符错误率(CER)降低了$ 28 $ \%,平均单词错误率(WER)在解码LM后降低了$ 36 $ \%。我们表明,与多样化的LM相比,大型LM可能无法提供实质性的改进。我们还证明,可以在特定于域的数据上获得高质量的转录,而无需重新培训ASR模型并显示了生物医学领域的结果。
translated by 谷歌翻译
培训多语言自动语音识别(ASR)系统具有挑战性,因为声学和词汇信息通常是特定于语言的。由于缺乏开源数据集和不同方法的结果,培训对Indo语言的多语言系统更加困难。我们将端到端多语言语音识别系统的性能与以语言识别(LID)为条件的单语模型的性能进行比较。来自多语言模型的解码信息用于语言识别,然后与单语模型结合使用,以改善跨语言的50%WER。我们还提出了一种类似的技术来解决代码切换问题,并在印度英语和孟加拉国英语中分别达到21.77和28.27。我们的工作谈到了如何将基于变压器的ASR尤其是WAV2VEC 2.0应用于开发用于指示语言的多语言ASR和代码转换ASR。
translated by 谷歌翻译
我们提出Vakyansh,这是一种用指示语言识别语音识别的端到端工具包。印度拥有近121种语言和大约125亿扬声器。然而,大多数语言在数据和预验证的模型方面都是低资源。通过Vakyansh,我们介绍了自动数据管道,用于数据创建,模型培训,模型评估和部署。我们以23个指示语言和Train Wav2Vec 2.0预验证的模型创建14,000小时的语音数据。然后,对这些预审预告措施的模型进行了修订,以创建18个指示语言的最先进的语音识别模型,其次是语言模型和标点符号修复模型。我们以使命开源所有这些资源,这将激发语音社区使用ASR模型以指示语言开发语音的首次应用程序。
translated by 谷歌翻译
Systemic Lupus红斑(SLU)是一种自身免疫性疾病,其中患者的免疫系统开始攻击身体的健康组织。狼疮肾炎(LN)是指由于这些攻击而导致肾脏组织的炎症导致肾功能衰竭。国际肾病学会/肾病学会(ISN / RPS)已释放了基于在SLE肾损伤期间观察到的各种模式的分类系统。传统方法需要对肾活检的细致病理学评估,并且是耗时的。最近,计算技术有助于通过使用虚拟显微镜或整个幻灯片成像(WSI)来缓解该问题。随着深度学习和现代计算机视觉技术的使用,我们提出了一种能够自动化的流水线,其能够使用提取的肾小球特征检测这些整个幻灯片图像中的各种幻灯片图案的过程和2)。
translated by 谷歌翻译
我们介绍了一个CLSRIL-23,一个自我监督的基于学习的音频预训练模型,它学习了来自23个指示语言的原始音频的交叉语言语音表示。它基于Wav2Vec 2.0之上,通过培训蒙面潜在语音表示的对比任务来解决,并共同了解所有语言共享的潜伏的量化。我们在预磨练期间比较语言明智的损失,以比较单机和多语言预制的影响。还比较了一些下游微调任务的表现,并且我们的实验表明,在学习语音表示方面,我们的实验表明,在学习语言的语音表示方面,以及在沿着流的性能方面的学习语音表示。在Hindi中使用多语言预磨模模型时,在WER中观察到5%的减少,9.5%。所有代码模型也都是开放的。 CLSRIL-23是一款以23美元的价格培训的型号,以及近10,000小时的音频数据培训,以促进在语言中的语音识别研究。我们希望将使用自我监督方法创建新的最新状态,特别是对于低资源指示语言。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
The rise in data has led to the need for dimension reduction techniques, especially in the area of non-scalar variables, including time series, natural language processing, and computer vision. In this paper, we specifically investigate dimension reduction for time series through functional data analysis. Current methods for dimension reduction in functional data are functional principal component analysis and functional autoencoders, which are limited to linear mappings or scalar representations for the time series, which is inefficient. In real data applications, the nature of the data is much more complex. We propose a non-linear function-on-function approach, which consists of a functional encoder and a functional decoder, that uses continuous hidden layers consisting of continuous neurons to learn the structure inherent in functional data, which addresses the aforementioned concerns in the existing approaches. Our approach gives a low dimension latent representation by reducing the number of functional features as well as the timepoints at which the functions are observed. The effectiveness of the proposed model is demonstrated through multiple simulations and real data examples.
translated by 谷歌翻译
Landing an unmanned aerial vehicle unmanned aerial vehicle (UAV) on top of an unmanned surface vehicle (USV) in harsh open waters is a challenging problem, owing to forces that can damage the UAV due to a severe roll and/or pitch angle of the USV during touchdown. To tackle this, we propose a novel model predictive control (MPC) approach enabling a UAV to land autonomously on a USV in these harsh conditions. The MPC employs a novel objective function and an online decomposition of the oscillatory motion of the vessel to predict, attempt, and accomplish the landing during near-zero tilt of the landing platform. The nonlinear prediction of the motion of the vessel is performed using visual data from an onboard camera. Therefore, the system does not require any communication with the USV or a control station. The proposed method was analyzed in numerous robotics simulations in harsh and extreme conditions and further validated in various real-world scenarios.
translated by 谷歌翻译
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
translated by 谷歌翻译